Goto

Collaborating Authors

 representational constraint


MAVEN: Multi-Agent Variational Exploration

Anuj Mahajan, Tabish Rashid, Mikayel Samvelyan, Shimon Whiteson

Neural Information Processing Systems

However, two key challenges stand between cooperative MARL and such real-world applications. First, scalability is limited by the fact that the size of the joint action space grows exponentially in the number of agents. Second, while the training process can typically be centralised, partial observability and communication constraints often mean that execution must be decentralised, i.e., each agent can condition its actions only on its local action-observation history, a setting known as centralised


Super Hard

Neural Information Processing Systems

We thank all the reviewers for their feedback. All reviewers are concerned whether we substantially outperform QMIX. Since StarCraft II experiments take a long time, we could not include all the results in the submission. Samvelyan et al. have classified as Easy, Hard & Super Hard. Results on several maps are shown below.



Super Hard

Neural Information Processing Systems

We thank all the reviewers for their feedback. All reviewers are concerned whether we substantially outperform QMIX. Since StarCraft II experiments take a long time, we could not include all the results in the submission. Samvelyan et al. have classified as Easy, Hard & Super Hard. Results on several maps are shown below.


Subset Selection Based On Multiple Rankings in the Presence of Bias: Effectiveness of Fairness Constraints for Multiwinner Voting Score Functions

Boehmer, Niclas, Celis, L. Elisa, Huang, Lingxiao, Mehrotra, Anay, Vishnoi, Nisheeth K.

arXiv.org Artificial Intelligence

We consider the problem of subset selection where one is given multiple rankings of items and the goal is to select the highest ``quality'' subset. Score functions from the multiwinner voting literature have been used to aggregate rankings into quality scores for subsets. We study this setting of subset selection problems when, in addition, rankings may contain systemic or unconscious biases toward a group of items. For a general model of input rankings and biases, we show that requiring the selected subset to satisfy group fairness constraints can improve the quality of the selection with respect to unbiased rankings. Importantly, we show that for fairness constraints to be effective, different multiwinner score functions may require a drastically different number of rankings: While for some functions, fairness constraints need an exponential number of rankings to recover a close-to-optimal solution, for others, this dependency is only polynomial. This result relies on a novel notion of ``smoothness'' of submodular functions in this setting that quantifies how well a function can ``correctly'' assess the quality of items in the presence of bias. The results in this paper can be used to guide the choice of multiwinner score functions for the subset selection setting considered here; we additionally provide a tool to empirically enable this.


MAVEN: Multi-Agent Variational Exploration

Mahajan, Anuj, Rashid, Tabish, Samvelyan, Mikayel, Whiteson, Shimon

arXiv.org Machine Learning

Centralised training with decentralised execution is an important setting for cooperative deep multi-agent reinforcement learning due to communication constraints during execution and computational tractability in training. In this paper, we analyse value-based methods that are known to have superior performance in complex environments [43]. We specifically focus on QMIX [40], the current state-of-the-art in this domain. We show that the representational constraints on the joint action-values introduced by QMIX and similar methods lead to provably poor exploration and suboptimality. Furthermore, we propose a novel approach called MAVEN that hybridises value and policy-based methods by introducing a latent space for hierarchical control. The value-based agents condition their behaviour on the shared latent variable controlled by a hierarchical policy. This allows MAVEN to achieve committed, temporally extended exploration, which is key to solving complex multi-agent tasks. Our experimental results show that MAVEN achieves significant performance improvements on the challenging SMAC domain [43].